PLSI Utilization for Automatic Thesaurus Construction

نویسندگان

  • Masato Hagiwara
  • Yasuhiro Ogawa
  • Katsuhiko Toyama
چکیده

When acquiring synonyms from large corpora, it is important to deal not only with such surface information as the context of the words but also their latent semantics. This paper describes how to utilize a latent semantic model PLSI to acquire synonyms automatically from large corpora. PLSI has been shown to achieve a better performance than conventional methods such as tf·idf and LSI, making it applicable to automatic thesaurus construction. Also, various PLSI techniques have been shown to be effective including: (1) use of Skew Divergence as a distance/similarity measure; (2) removal of words with low frequencies, and (3) multiple executions of PLSI and integration of the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Viii-1 Viii. an Experiment in Automatic Thesaurus Construction

A method is presented for the automatic construction of thesauruses used in information retrieval systems. The construction algorithm is based on the concept-concept associations displayed in a sample document collection.

متن کامل

Construction of Thematic Representations of Texts Based on Domain-Specific Thesaurus

The paper considers interrelations between lexical cohesion and the thematic structure of a text. The technique of automatic construction of the thematic representation of the text contexts is described. The technique uses knowledge from Sociopolitical thesaurus, which was specially developed as a tool for automatic text processing.

متن کامل

Improving Context Vector Models by Feature Clustering for Automatic Thesaurus Construction

Thesauruses are useful resources for NLP; however, manual construction of thesaurus is time consuming and suffers low coverage. Automatic thesaurus construction is developed to solve the problem. Conventional way to automatically construct thesaurus is by finding similar words based on context vector models and then organizing similar words into thesaurus structure. But the context vector metho...

متن کامل

Automatic thesaurus construction

In this paper we introduce a novel method of automating thesauri using syntactically constrained distributional similarity. With respect to syntactically conditioned cooccurrences, most popular approaches to automatic thesaurus construction simply ignore the salience of grammatical relations and effectively merge them into one united ‘context’. We distinguish semantic differences of each syntac...

متن کامل

Building Thesaurus from Manual Sources and Automatic Scanned Texts

This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in a consistent common base. The automatic extraction is based on both syntax and statistics. We present in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005